Virtual machines, virtually everywhere

Twenty years ago, almost to the day, Amazon Web Services (AWS) launched Simple Storage Service (S3). A few months later, the company’s Elastic Compute Cloud (EC2) service opened for public beta testing before rolling out officially in 2008. These events sparked the era of modern on-demand cloud storage and computing that changed how organizations of all sizes think about their IT infrastructure.

Fast-forward to the present and you would be hard-pressed to find many organizations that haven’t ‘lifted and shifted’ at least part of their workloads to the cloud, or aren’t planning to do so soon. Indeed, some now run entirely in the cloud, while many others have paired cloud workloads, often in multi-cloud setups, with on-prem resources that won’t be retired anytime soon.

Of all the things that these organizations have in common, one warrants a closer look: virtual machine (VM) sprawl, or uncontrolled growth of virtual machines that are often left to fend for themselves.

A sprawling problem

Public cloud service providers (CSPs) make provisioning new VMs frictionless by design; after all, this is partly what makes their offering so appealing in the first place. As many admins can attest, a new VM instance can be stood up within moments, but decommissioning it rarely gets the same urgency.

In many companies, especially those with multi-cloud setups involving AWS, Azure, GCP and/or other CSPs, this sprawl results in a growing stockpile of workloads that exist outside security operations. CSPs do provide baseline protections, but the ongoing work falls on the customer. The machines often don’t even receive operating system updates; worse, they’re generally unmonitored and subject to access policies that haven’t changed since the day someone created the instance. This increases the risk that a virtual machine will ‘go rogue’ while remaining under the radar – until it’s too late.

Cloud visibility as such is a persistent problem, as only about 23% of organizations report having a comprehensive view of their cloud footprint. Unchecked growth of assets, including fleets of VMs, is a big part of the problem. The staple attack paths – misconfigured storage buckets and exposed APIs – dominate breach disclosures, in part because they produce public-facing signals. Meanwhile, VM abuse happens more subtly and inside an environment; a managed identity querying cloud storage won’t set off the same alarms as an external IP address attempting to log in.

A recent report by the Cloud Security Alliance (CSA) ranked misconfiguration and inadequate change control as the main threat for cloud resources, followed by identity and access management (IAM) weaknesses. This tracks with the identity-driven nature of cloud workloads, where both the VM itself and what it can access deserves scrutiny. According to Microsoft’s 2024 State of Multicloud Security Report, workload identities assigned to VMs and other non-human resources vastly outnumber human identities, and the gap is only widening as organizations spin up more compute resources.

The reality is rather mundane – say, a machine learning engineer provisions a VM for data processing tasks. The VM is granted an identity but since scoping its permissions in keeping with the principle of least privilege would be too time-consuming, it receives broad read/write access to data storage and other resources. The projects wrap up, but the over-permissioned VMs are ‘left to their own devices.’

Left to rot

An abandoned VM can do more than ‘collect dust’, however. Since every VM is bound to some form of identity that determines what the workload can access across the environment, forgotten instances may be exploited by bad actors to gain an initial foothold. As VMs in the same virtual private cloud (VPC) or virtual network (VNet) can often talk to each other in the ‘east-west’ direction without much restriction, a VM can probe adjacent instances, reach internal databases or storage endpoints, and exploit whatever permissions it was granted. Far too often, network micro-segmentation turns out to be too daunting a task.

In hybrid environments involving hybrid identities, things can get even more complicated. For example, when on-prem Active Directory is synced with Entra ID, a compromised VM in Azure that’s joined to an Entra ID tenant may be able to reach file shares, databases, applications or other resources that are part of the organization’s core on-prem infrastructure.

Examples of actual attacks involving VMs aren’t hard to come by. In one campaign, attackers moved between AWS EC2 instances over internal Remote Desktop Protocol (RDP), staged hundreds of gigabytes of exfiltrated data across multiple VMs, and unleashed ransomware inside the cloud network. Monitoring did catch the activity, but automated response wasn’t properly set up to stop it and the ransomware deployment went ahead.

Other attackers are exploiting the very ease with which VMs can be spun up. Microsoft has documented a campaign in which compromised Azure accounts were misused to provision short-lived VMs as throwaway attack infrastructure. Since the traffic came from legitimate, Azure-associated IP addresses, the alerts were dismissed as false positives.

Fighting deploy and decay

Chances are that your IT and security teams are small and handle security alongside other IT responsibilities, which has a lot to do with what kind of tooling works at this scale. Security products that rely on deep platform-specific expertise, complex deployment procedures and a number of tools for managing various parts of the IT infrastructure may not fit the bill. They may even miss the part of the sprawl problem that matters most.

Muddying the waters further, what happens when an incident involves identity abuse? An attacker on a rogue VM may not be doing anything that looks suspicious from inside the VM alone when using its identity to access cloud or on-prem resources. Catching the anomaly requires connecting what’s happening on the VM itself to what the VM’s identity is doing across the wider environment. That kind of correlation hinges on integration with identity solutions like Entra ID and Active Directory.

There’s also the question of speed. When a compromised cloud workload can reach on-prem resources through a federated identity chain, the window between initial compromise and serious damage can be short. (Auto)isolating a VM before lateral movement begins needs to happen at any hour. It’s one of the scenarios where AI-driven correlation and runtime detection earn their keep – no one can watch every workload around the clock and respond quickly enough.

Successful incursions cost businesses dearly. According to a recent survey, one in three SMBs reported being hit with substantial fines following a cyberattack. It’s also a reminder that non-compliance may come with direct financial consequences. Regulatory frameworks such as NIST 800-53 and PCI DSS 4.0 are getting more specific about cloud workload security and companies are increasingly expected to ensure that the identities assigned to cloud workloads are scoped appropriately and monitored continuously. Demonstrating access controls on the servers hosting sensitive data isn’t enough when the risk resides at the identity layer.

Meanwhile, IBM’s Cost of a Data Breach 2025 report found that 30 percent of breaches affected data strewn across multiple environments, which shows the problems that organizations face when it comes to defending their assets in various environments. A meaningful share of the resulting cost traces to the length of time between infiltration and detection, also known as dwell time. Organizations that can’t see what’s happening inside their environments tend to discover breaches through ‘external’ signals, such as a customer complaint, by which point the attacker has had weeks or months of access.

Parting thoughts

VMs are one of the oldest and most frequently deployed modern cloud resources. VM sprawl accumulates quietly and often reveals itself after something has gone wrong. The unprotected workloads carry identities and communicate with one another and with on-prem resources in traffic patterns that not all security controls can observe and catch.

For starters, every organization needs to inventory its VM fleets across all cloud platforms, review the permissions attached to the identity of each VM, and audit their settings for unnecessary ‘east-west’ and ‘north-south’ openness. Good fences make for good neighbors, as the saying goes.

For organizations running workloads across cloud and on-prem environments, the question is whether their security tooling can keep an eye on VMs with the same rigor as applied to the endpoints on employee desks and other parts of their infrastructure. Only then can they see the full picture and secure their data across various environments.